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ABSTRACT 


A performance  rating  scale  that  included  10  traits  reflecting  various 
technical  and  non-technical  aspects  of  Navy  shipboard  performance  was  developed 
and  used  to  evaluate  the  performance  of  Electrician's  Liates  (Eli's)  and  Engineriien 
(EN's)  serving  aboard  submarines  in  the  Atlantic  and  Pacific  Fleets.  Analysis  of 
results  indicated  that  officers  and  petty  officers  using  the  scale  tended: 

(1)  to  agree  with  one  another  when  they  evaluated  the  same  men 

(2)  to  be  consistent  in  their  own  evaluations  from  one  time  to  the 
next 

(3)  to  discriminate  reliably  among  men  of  the  same  pay  grade 

(4)  to  differentiate,  to  an  appreciable  degree,  the  technical  from 
the  adjustive  aspects  of  shipboard  performance. 

In  addition,  a factorial  analysis  indicated  that: 

(5)  at  least  two  broad  "factors"  of  shipboard  performance  - one 
representing  technical  skill,  and  the  other,  adjustment  to  Navy 
life  - accounted  for  most  of  the  intercorrelations  among  traits 

(o)  the  traits  representing  the  technical  side  of  performance  cor- 
related moderately  high  with  independent  measures  of  technical 
skill,  but  the  traits  representing  the  adjustment  side  of  per- 
formance were  not  related  appreciably  to  any  other  measures 
obtained. 

As  a part  of  this  overall  research  project  practical  performance  tests  and 
performance  check  lists  for  Em's  and  isN's  were  also  developed  (see  Parts  I and  III 
of  this  Final  Report  for  details). 

Relationships  between  these  two  measures  and  the  rating  scale  are  reported 


and  discussed  in  this  report 


Chapter  I 


3Ui.ff.iARY  AND  OPERATIONAL  IMPLICATIONS 


At  the  time  this  research  was  initiated,  shipboard  performance  in  the  Navy 
was  evaluated  by  means  of  Quarterly  Marks  assigned  by  division  officers.  It  ivas 
considered  by  many,  in  the  Navy  and  out,  that  the  Quarterly  Marks  did  not  adequately 
reflect  the  differences  among  enlisted  men's  abilities  to  perform  their  duties. 

There  was  a marked  tendency  for  the  assigned  scores  to  pile  up  around  the  highest 
possible  value  (4.0).  Unless  it  were  assumed  that  most  men’s  performance  was  of 
equal  excellence,  these  marks  were  failing  tc  give  discriminating  and  realistic 
indications  of  performance. 

For  this  reason,  research  was  conducted  to  determine  if  more  adequate  means 
of  rating  shipboard  performance  could  be  developed.  This  report,  which  is  Part  II 
of  a Final  Report  of  a study  of  shipboard  performance  measures,  describes  a per- 
formance rating  scale  that  was  developed  for  use  by  submarine  officers  and  petty 
officers  in  evaluating  the  performance  of  Electrician’s  Mates  (EU’s)  and  Engineraen 
(EN's)  serving  under  them. 

In  its  final  form,  the  performance  rating  scale,  following  a format  that 
permitted  man-to~man  comparisons,  included  these  ten  traits: 


Cooperation 
Knowledge  of  the  Job 
Application  and  Initiative 
Judgment  and  Common  Sense 
Leadership 


Neatness  of  Work 
Care  of  Equipment 
Ability  to  Troubleshoot 
Sincerity  in  Doing  a Good  Job 
Discipline 


In  addition,  a Rank  «'>der  Preference  (relative  overall  value  ir.  the  gang) 


was  included  ns  a surranary  evaluation, 

A total  of  320  EM’s  and  487  EN's  were  rated  on  these  traits.  The  majority 


of  the  men  were  rated  by  at  least  three  officers  and/cr  petty  officers.  The 
reliability  (consistency)  of  the  ratings  was  calculated  by  having  a representative 
group  of  raters  re-rate  their  men  after  a suitable  lapse  of  time.  It  was  found  to 

Ko  nii  it  a It  { / «•  — r.Q\  1 — J ...i  a.  i.  - ~ ~ u I..I.AM 

— ~ 'tv* , ww/  t uatcxb  aisu  ciuiccj  n:ui  unc  diu/tn^.v  Duudidiniauj'  much 

they  rated  the  same  men,  the  average  inter-rater  agreement  on  total  score  being 
rt  = .70, 


The  ratings  assigned  by  all  raters  on  each  of  the  traits  were  intercorrelated 
and  analyzed  to  find  what  general  elements  (factors)  of  performance  could  account 
for  the  intercorrelations.  Two  factors  were  isolated  in  each  of  two  independent 
analyses: 

1,  Technical  competence  (ability  to  meet  the  practical  or  technical  demands 
of  a job) 

2.  Personal  adjustment  (attitude  towards  both  job  and  other  elements  of 
Navy  life) . 


The  measures  obtained  with  the  performance  rating  scale  were  also  correlated 
with  scores  from  two  other  measures  of  shipboard  performance:  performance  check 

lists  and  practical  performance  tests.  The  check  lists  we  re  designed  to  provide 
ratings  and  the  performance  tests  objective  scores  of  ability  to  perform  specific 
tasks  from  Ef-l's  and  EN's  jobs. 

It  was  found  tiiat  the  rating  scale  and  check  list  scores  were  highly  cor- 
related (about  .85)  when  men  in  all  pay  grades  were  rated  together  (the  substantial 
correlation  between  pay  grade  and  performance  ratings  in  part  induced  this  cor- 
relation). but  the  correlations  between  rating  scale  trait  scores  and  performance 
test  scores  were  low  to  moderate  (.25  to  .62).  On  a w:ithin-pay-grade  basis,  the 
two  rating  device*-  correlated  moderately  (.23  to  .61)  while  the  rating  scale  scores 
and  performance  test  scores  correlated  lov;  to  moderately  (.02  to  .40),  The 


correlations  were  generally  higher  for  the  EM’s  ihan  for  the  EN's. 


It  was  concluded  that  discriminating  and  reliable  ratings  of  general  traits 
can  be  obtained  which  will  reflect  important  aspects  of  shipboard  performance . 

Such  ratings  may  not , however,  indicate  very  much  about  ability  to  perform  specif  it 
tasks  from  shipboard  jobs. 

The  present  and  probable  continued  use  of  rating  devices  as  measures  of 
shipboard  performance  gives  operational  significance  to  many  of  the  findings  of 
this  study. 

Specifically: 

1.  Reliable  ar.d  discriminating  ratings  can  be  made  by  Navy  officer 
and  petty  officer  personnel. 

2.  A man-to-man  rating  format  appears  to  promote  discriminating 
ratings  and  inter-rater  agreement. 

G>.  Ratings  should  be  performed  ‘.eparately  for  different  gangs  OFid 
pay  grades, 

4.  Both  technical  and  adjustment,  aspects  of  shipboard  performance 
can  be  rated. 

5.  Inter-rater  differences  in  leniency  and  spread  of  ratings  must  be 
accounted  for  in  order  to  achieve  maximum  usefulness  from  ratings. 

6.  Comparisons  of  the  technical  skill  of  men  from  different  boats 
demand  the  use  of  performance  tests  as  well  as  ratines. 


Chapter  II 


BACKGROUND  AND  DEVELOPMENT  CF  THE  RATING  SCALE 


The  problem 

"In  wartime  ...  elaborate  routines  of  test  administration  aboard  ship  may 
not  be  feasible,  and  a more  convenient  procedure  for  evaluating  performance  is 
needed.  The  use  of  rating  methods  would  provide  the  means  for  making  such  evalu- 
ations. 

The  problem  of  evaluating  the  shipboard  performance  of  Navy  men  is  an 
exceedingly  important  one.  The  success  and  constant  improvement  of  selection, 
training  and  placement  programs  depend  in  large  part  on  the  availability ■ of  suitable 
performance  measures  which  can  be  used  as  criteria  against  which  the  effectiveness 
of  these  programs  can  be  judged. 

In  the  Navy,  and  particularly  in  the  submarine  Navy,  evaluations  of  per- 
formance require  consideration  of  a complexity  of  factors.  Technical  know-how  and 
skill  are  essential;  even  more  important,  however,  may  be  factors  of  an  altitudinal 
or  adjustive  nature  reward  Navy  life  and  the  men  with  whom  one  lives  24  hours  a day. 

This  study  is  concerned  with  an  effort  to  devise  a rating  scale  which  could 
be  used  to  obtain  measures  of  the  many  important  aspects  of  shipboard  performance. 
It,  toaether  with  other  studies  on  the  problem  of  shipboard  performance  measures, 
provides  partial  answers  to  this  important  problem. 2 


Human  Factors  in  Military  Efficiency,  Summary  Technical  Report  of  the  Applied 
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^ See  also  Part  I,  'The  use  of  practical  performance  tests  in  tiie  measurement  of 
shipboard  performance  of  Naval  enlisted  personnel."  and  Part  III,  "The  use  of 
performance  check  lists  in  the  measurement  of  shipboard  performance  of  enlisted 
Naval  personnel ." of  this  final  report. 


Hie  Quarterly  Marks  System 

At  the  time  this  research  was  initiated,  shipboard  performance  was  regularly 
evaluated  in  the  Navy  through  the  assigning  of  Quarterly  Marks,  Enlisted  men  were 
rated  on  their  Proficiency  in  Rate  and  Conduct , and,  depending  on  their  job  and  pay 
grade,  on  their  Leadership,  Seamanship,  and  Mechanical  Ability. 

The  Quarterly  Marks  scale  had  a score  range  from  0.0  to  4.0.  However,  a 
mark  below  2.5  indicated  unsatisfactory  performance,  and  tiiere  was  a tendency,  in 
practice,  for  both  Division  Officers  and  the  men  being  rated  to  consider  any  mark 
below  3.5  unsatisfactory.  As  a result,  the  marks  tended  to  pile  up  between  3.5 
and  4.0,  with  the  majority  for  the  higher  pay  grades  at  or  near  4.O..  Little  dis- 
crimination resulted  from  such  ratings  and  a man's  scores  seldom  gave  a clear 
indication  of  whether  the  job  he  was  doing  was  excellent,  good,  ordinary,  or  just 
fair. 


The  problems  faced  by  the  Division  Officer  Cor  whoever  does  the  rating)  in 
assigning  Quarterly  Marks  may  be  illustrated  by  a few  extracts  from  the  DuPers 
Manual: 

In  determining  marks,  it  shall  be  borne  ir.  mind  that  the  mark 
for  proficiency  in  rate  is  intended  to  be  sufficient  in  itself 
to  denote  a person's  ability,  habits,  and  character;  in  short, 
the  individual’s  value  to  the  service  in  the  particular  rate.** 


The  following  qualifications  are  then  listed  as  the  determining  factors  in  assign- 
ing marks  of  4.0  and  3.5: 

4.0  Competent  and  reliable  in  rate.  Not  less  than  3.5  in  conduct. 

3.5  Competent  and  qualified  in  all  duties  of  rate;  has  qualities  suf- 
ficient to  justify  advancement.  Not  less  than  3.5  in  conduct. ^ 

3 bureau  of  Naval  Personnel  Manual,  up,  181-102. 

4 Ibid.,  p;  183. 
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This  evident  lack  of  distinction  between  the  marks,  and  the  emphasis  put  on 
a single,  broad,  performance  characteristic  may  account,  in  part,  for  the  fact  that 
most  of  the  Quarterly  Marks  were  at,  or  near,  4,0. 

With  this  brief  critique  of  the  characteristics  of  Quarterly  Marks,  let  us 
pass  or.  to  the  considerations  which  governed  the  design  of  the  rating  scales 
developed  for  this  study. 

Considerations  for  the  design  of  a. rating  scale 

A rating  scale  is  s device  that  should  help  an  officer  or  petty  officer 
organize  and  express  his  evaluations  of  the  men  who  work  under  him.  In  order  that 
they  be  useful,  such  evaluations  depend  to  a great  extent  on  three  things: 

1.  The  format  of  the  scale.  Does  it  provide  reference  points  by  which 
the  rater  can  reliably  determine  where  each  man  should  be  placed  on  the 
seal 2? 

2.  The  wav  relevant  aspects  of  Oc:vfor»iance  are  described.  Do  they  help 
the  rater  "break  down"  the  job?  Do  they  bring  out  the  different 
features  of  performance?  Do  the  trait  descriptions  help  the  rater 
visualize  the  man  or.  the  job? 

3.  The  vfB\  "levels  of  performance"  are  defined.  Do  they  help  the  rater 
separate  the  "good"  from  the  "not  so  good",  and  the  "superior"  from  the 
"above  average"? 

Requirements  of  rating  scale  construction  and  design 

Investigation  of  the  rating  scale  literature  and  consideration  of  the  goals 
of  this  research  have  led  to  the  following  general  requirements  which  affect 
rating  scale  construction  and  des  ign: 

A.  A rating  scale  should  be  designed  to  reflect.,  insofar  as  possible,  the 


differences  among  the  performance  levels  of  the  men  being  rated.  That 
is,  ratings  given  to  a group  of  men  should  be  spread  over  a relatively 
large  portion  of  the  scale  and  few  men,  if  any,  should  receive  the  same 
score.  The  scale  should  yield  scores  that  discriminate. 5 

2.  A rating  scale  should  be  designed  so  that  the  "average  rating"  of  a 
typical  group  of  men  falls  somewhere  near  the  middle  of  the  scale. 

The  bunching  of  scores  at  the  upper  end  of  the  scale,  so  typical  of 
ratings,  is  unrealistic  and  reduces  discrimination  as  well.  (It  will 
be  remembered  that  the  Quarterly  Harks  had  a tendency  to  be  bunched  at 
the  upper  end  of  the  scale.  This  was  a fault  of  the  scale  design  as 
well  as  a result  of  overly  lenient  rating  practices.) 

3.  A rating  scale  should  be  designed  so  that  different  raters  will  agree 
on  the  position  occupied  on  the  scale  by  each  man  rated.  This  is  very 
difficult  to  achieve  when  raters  are  forced  to  rely  on  descriptive 
adjectives  or  numerical  values  in  establishing  key  points  on  the  scale. 
More  complete  verbal  descriptions  of  observable  bits  of  behavior  should 
be  of  some  help  to  raters  in  agreeing  on  the  positions  their  men  occupy. 
However,  the  format  of  the  rating  device  can  also  be  such  as  to  promote 
agreement. 

4.  The  design  of  a rating  scale  should  be  such  that  it  will  help  the 
raters  using  it  to  be  consistent  in  their  ratings  of  the  same  men  from 
one  rating  period  to  the  next.  This  is  reliability  in  the  test-retest 
sense.  Assuming  raters  had  no  recollection  of  scores  given  at  an 

**  This  assumes,  of  course,  that  men  are  actually  different  from  each  other  in  the 
performance  characteristics  on  which  they  are  to  be  rated.  From  present  day 
knowledge  of  individual  differences  this  appears  a reasonable  assumption. 
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earlier  time,  there  still  should  be  a high  relationship  between  scores 
given  to  a group  of  men  at  interval?  of,  say,  90  to  180  days. 

With  these  requirements  in  mind,  the  actual  construction  of  a performance 
rating  scale  was  begun. 

Fttmta»-Qf  the_experimertal  scale 

Rating  scale*  typically  are  single-page  arrangements  with  the  name  of  the 
man  to  be  rated  at  the  top  of  the  page  and  a series  of  traits  on  which  he  is  to  be 
rated  listed  down  the  side.  Performance  levels  on  this  type  of  rating  form  are 
usually  expressed  in  terms  of  some  scale  of  numbers  or  adjectives,  or  bcth.  For 
example,  a man  may  be  rated  as  "excellent’*  or  "4.0"  on  one  characteristic,  or 
"good"  or  "3.5"  on  another,  and  so  forth.  This  kind  of  rating  form,  though  simple 
in  design  ar.d  easy  to  >’$e;  minimizes  the  likelihood  of  obtaining  useful  ratings. 

It  fails  to  provide  the  raters  with  a satisfactory  framework  for  comparison.  The 
adjectives  or  numbers  used  usually  have  quite  different  meanings  to  different 
raters  at  different  times.  Either,  or  both,  of  conditions  leads  to  unreliable 

ratings.  Words  and  numbers  on  a rating  scale  lack  absolute  meaning  and  are  thus 
subject  to  as  many  interpretations  as  there  are  raters. 

It  is  exceedingly  difficult  to  state  how  absolutely  good  or  poor  a man  is 
in  his  work.  It  is  much  easier  to  say  he  is  good  compared  to  this  or  poor  compared 
to  that.  The  more  concrete  and  meaningful  are  the  reference  points  against  which 
these  necessary  comparisons  must  be  made,  the  more  objective  and  reliable  will  be 
the  resulting  ratings.  Something  more  than  numbers  or  adjectives  is  needed  although 
Liiey  may  be  of  limited  help.  The  history  of  ratings  suggests  that  this  something 
more  may  be  achieved  by  evaluating  a man's  performance  telative  to  that  of  his 
peers. 

In  most  job  situations,  men  belong  to  an  identifiable  work  group.  Each  man 
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within  the  group  either  does  work  similar  to  that  of  the  others,  or  works  under 
approximately  the  sane  conditions,  or  both.  The  members  of  this  group  providt- 
meaningful,  stable  points  of  reference  for  ell  raters  who  are  sufficiently  well 
acquainted  with  the  group  to  do  the  rating, 

A man's  assigned  rank  in  a group  on  any  one  trait,  or  his  average  rank  in  a 
group  on  all  traits,  is  an  indication  of  his  level  of  performance  based  on  meaning- 
ful reference  points  — namely,  the  other  members  of  the  work  group. 

The  performance  rating  scale  was  designed  with  this  man-to-man  comparative 
procedure  in  mind.^  The  following  features  were  considered  important:  (A  copy 
of  the  final  form  of  the  rating  scale  may  be  seen  in  Appendix  B.) 

1.  Space  was  provided  on  each  page  of  the  rating  scale  booklet  for  the 
names  of  all  men  within  a given  work  group.  Each  rater  rated  each  man 
as  3 member  of  a particular  work  group.  In  this  study  the  group  was 
either  the  EM  or  EN  gang  aboard  submarines.7 

2.  A single  performance  trait  was  listed  on  each  page  of  the  booklet.  All 
men  were  therefore  rated  as  a group  on  ONE  trait  at  a time. 

This  procedure  of  rating  a group  of  men  ori  each  trail  separately 
is  to  be  contrasted  with  the  usual  procedure  of  rating  one  man  on  all 
traits  without  direct  reference  to  the  other  persons  to  be  rated. 

3.  The  definition  of  each  trait  was  followed  by  four  series  of  statements 
down  the  side  of  the  page  defining  various  degrees  of  possession  of  the 


6 The  design  was  not  completely  original.  It  was  first  tested  by  Stevens  and 
Wonderlic,  "An  Effective  Revision  of  the  Rating  Scale  Technique,"  Personnel 
Journal , 13;  1934,  125-134.  It  was  recommended  by  Guilford,  Psychometric  Methods 
McGraw-Hill  Book  Co.,  New  York,  iv3t>,  idb3-264.  It  was  further  tested  by  Gilinski 
"The  I r»  a IlSGTiC  e of  the  Procedure  of  Judging  on  the  Halo  Effect,"  American  Psy- 
chologist, 2:309-319,  1947. 

In  subsequent  work,  groups  have  been  homogeneous  with  respect  to  pay  grade  as 
well  as  rate. 
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trait  in  question.  The  top-most  group  of  statements  described 
exceptionally  outstanding  behavior.  The  next  group  described 
clearly  above  average  performance;  the  third  group  described 
average  or  satisfactory  performance  and  the  last  group  of 
statements  described  below  average  or  undesirable  performance. 

Thus,  three  of  the  four  groups  of  statements  described 
acceptable  performance  or  better.  This  was  done  in  an  effort  to 
keep  the  average  rating  down  toward  the  physical  center  of  the 
scale,  recognizing  both  the  tendency  of  raters  to  rate  their 
men  high  and,  at  the  same  time,  the  natural  selection  and  admit- 
tedly superior  performance  of  most  higher  rated  petty  officers. 

4.  In  a column  underneath  each  man’s  name  was  a continuum  line. 

This  line  was  unbroken  except  for  short  horizontal  marks  which 
located  on  the  continuum  the  position  of  each  group  of  descrip- 
tive statements.  The  raters  could  thus  locate  where,  on  the 
line,  a given  description  of  behavior  would  fall.  They  could 
then  decide  whether  the  performance  of  a man  being  rated  was 
equal  to,  better  or  worse  than  3 particular  description  and  rate 
him  at  any  appropriate  point  along  the  entire  continuum. 

On  eacii  page  of  the  rating  scale  booklet,  therefore,  appeared 
a definition  of  the  trait  on  which  the  men  were  to  be  rated,  four 
statements  defining  the  degree  of  possession  of  the  trait,  and 
the  names  01  the  men  being  rated.  Below  each  of  the  latter  was 
a column,  representing  a continuum,  in  which  the  ratings  were  to 
be  made  with  a check  mark. 

Each  time  a man  wa_s  rated  on  a niven  trait . the  rat  inns 
assigned  other  pen  had  to  be  taken  into  consideration  by  the 
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rater.  Tims,  each  man  was  directly  compared  with  each  other  man 
in  the  group  on  each  of  the  traits. 


The  selection  of  performance  traits  for  t he  ratine  scale 

To  explore  as  many  facets  of  shipboard  performance  as  possible,  a wide 
selection  of  traits  was  node  for  the  preliminary  form  of  the  rating  scale. 
These  traits,  which  are  listed  below,  were  gleaned  from  a study  of  shipboard 


jobs  and  from  conversations  with  Naval  personnel: 


Social  Adjustment 
Quality  of  Work 
Neatness  of  Appearance 


Adaptability 

Leadership 
Overall  Efficiency 


Cooperation 
Watch  Standing 
Knowledge  of  the  Job 

Discipline 

Application  and  Initiative 
Dependabil ity 


Neatness  of  Work 
Ability  to  be  Taught 
Care  of  Eruipment 

Ability  to  Troubleshoot 
Sincerity  in  the  Job 
fiannal  Skill 


Overall  Efficiency  in  Rate  (within  pay  grade) 


In  addition  to  an  overall  rank  order  preference,  the  final  form  of  the 


rating  scale  consisted  of  ten  traits. 

Cooperation 
Knowledge  of  the  Job 
Application  and  Initiative 
Judgment  and  Common  Sense 
Leadership 


These  were: 

Neatness  of  Work 
Care  of  Equipment 
Ability  to  Troubleshoot 
Sincerity  in  Doing  a Good  Job 
Discipline 


Details  of  the  analysis  that  led  to  this  final  selection  of  traits  are 
reported  in  Chapter  III, 


Use  of  the  performance  raving  scale  aboard  ship 

As  an  exploratory  study,  the  preliminary  rating  scale  was  used  to  obtain 
evaluations  of  107  El-i's  and  EN’  s of  Submarine  Squadrons  3 and  7 at  San  Diego. 


Ml  pay  grades  from  strikers  through  chief  petty  officers  (Cl'C's)  v/ere  included. 
The  men  rated  were  selected  solely  on  the  basis  of  availability. 

« revised  preliminary  form  was  used  later  to  obtain  evaluations  of 
206  submarine  Eli's  and  EN's.  Analysis  of  these  two  preliminary  forms  led  to 
the  development  of  the  final  form  which  is  included  in  Appendix  B to  this 
report. 

The  final  form  of  the  rating  scale  was  used  to  evaluate  the  same  groups 
of  Eli's  and  EN's  that  were  subsequently  evaluated  with  the  performance  check 
lists  and  tested  with  the  practical  performance  tests  described  in  Parts  I and 
III  of  this  report.  These  groups  included  320  Ell's  and  407  EN's,  the  greater 
majority  of  the  men  being  strikers  and  third  class  petty  officers. 

An  effort  was  made  to  get  as  many  raters  as  possible  who  felt  they 
knew  the  men  to  be  rated  well  enough  to  rate  then  fairly.  In  most  instances, 
this  meant  the  Engineering  Officer,  the  two  leading  Chief  Ell's,  and  the  two 
lending  Chief  EN's.  In  some  instances,  ratings  v/ere  made  by  the  Executive 
Officer  or  Assistant  Engineering  Officer. 

The  majority  of  the  EM's  and  EN's  were  rated  by  either  one  officer  and 
two  petty  officers  or  by  two  officers  and  one  petty  officer;  so,  typically, 
at  least  three  ratings  were  obtained  on  each  man, 

I nstrncr ions  to  the  raters 

'written  instructions,  together  with  sample  ratings,  were  given  to 
all  raters.  (Examp’ ->s  may  be  seen  on  pages  1 and  2 of  the  rating  scale  in 
Mie  Appendix.)  In  addition,  in  the  preliminary  study  raters  v/ere  instructed 
personally  by  project  personnel.  Common  pitfalls  of  rating,  such  as  excessive 
leniency,  halo  effect,  and  high  interrelationships  among  traits  due  to  logical 
errors  were  explained  to  the  raters  in  simple  terms  and  they  were  asked  to 
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guard  against  then'..  During  later  administrations,  time  considerations  prevented 
personal  instruction  and  the  written  instructions  had  to  suffice. 

The  written  instructions  emphasized  the  following: 

1.  The  differences,  no  matter  how  small,  in  the  performance 
levels  of  the  men  being  rated  should  be  brought  out.  Men 
in  each  work  group  should  be  compared  with  one  another  on 
each  trait.  It  was  suggested  that  the  poorest  and  best 

man  in  each  group  ("best”  or  "poorest"  on  the  trait  in  question) 
should  be  rated  first,  and  then  the  other  men  rated  in  between. 
Haters  were  instructed  to  avoid  giving  tie  ratings  if  at  all 
possible. 

2.  The  men’s  pay  grades  should  not  be  unduely  influential  in  the 
in  the  ratings  assigned.  Men  were  to  be  rated  on  eacli  char- 
acteristic strictly  on  the  basis  of  their  demonstrated  perform- 
ance, Haters  were  cautioned  against  the  general  "halo-effect” 
of  pay  grade. 

The  halo  created  by  pay  grade  is  a real  problem  in  analyzing  ratings 
given  to  Naval  personnel.  It  is  natural  for  an  officer  or  petty  officer  to  be 
affected  by  the  man’s  pay  grade  when  he  evaluates  a man’s  performance.  Part 
of  this  influence,  no  doubt,  is  the  result  of  genuine  differences  in  ability 
reflected  in  the  different  pay  grades,  but  part  very  likely  is  spurious. 

The  effect  of  pay  grade  on  the  usefulness  and  reliability  of  ratings 
was  examined  in  considerable  detail  during  this  study.  Most  statistical 
analysis  reported  in  later  chapters  were  performed  both  on  a within  and  across 
pay  grade  basis  in  order  to  answer  the  r.iar.y  questions  involved. 
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Scoring  the  tatinns 


It  will  be  remembered  that  each  definition  of  a performance  trait  on 
the  pages  of  the  rating  scale  booklet  was  accompanied  by  four  groups  of  state- 
ments down  the  side  of  the  page  defining  the  degree  of  possession  of  that 
trait.  Across  the  top  of  the  page  were  listed  the  names  of  the  men  to  be  rated. 
A check  mark  in  a column  opposite  one  of  the  scries  of  qualifying  statements 
or  between  two  such  series,  and  under  a particular  man's  name,  indicated  that 
man's  rated  performance  level  on  a particular  trait. 

The  distance  of  the  check  mark  from  the  lower  end  of  the  column  was 
converted  into  a numerical  score  by  measuring  it  with  a centimeter  rule.  *11 
columns  were  thirteen  centimeters  long,  and  so  possible  scores  ranged  from 
0.0  to  13.0.  Each  check  mark  was  read  to  the  nearest  half  centimeter. 

These  raw  scores  varied  a good  deal  from  rater  to  rater  in  central 
tendency  and  spread.  To  make  ratings  from  different  raters  comparable,  it 
was  necessary  to  convert  them  into  standard  scores.0  This  had  the  effect  of 
producing  a common  average  for  all  ratings  and  reduced  the  effect  of  differen- 
tial spread. 


^’Standard  scores  were  expressed  in  terms  of  the  Sten-scale,  Scores  on  this 
scale  ranged  from  0 to  9 inclusive,  with  the  mean  at  4.5.  Each  unit  on  the 
scale  covered  one-half  standard  deviation. 
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Chapter  III 


RESULTS  FROM  SHIPBOARD  USE  OF  THE  RATING  SCALE 
A,  Measuring  Characteristics  of  the  Scale 

Average  rating  and  spread  of  ratines  over  the  scale 

In  Table  I5  appearing  on  page  17,  the  mean  and  standard  deviation  of 
ratings  ore  reported  for  each  trait  of  the  preliminary  form  of  the  rating 
scale.  This  analysis  v/as  not  repeated  for  the  revised  form,  there  being  no 
suggestion  of  any  differences  in  results.  If  anything,  the  variability  of 
ratings  on  the  revised  form  probably  v/as  slightly  greater  than  that  reported 
here  because  of  the  specific  instruction  in  later  administrations  to  avoid 
tie  ratings.  To  indicate  the  variability  of  the  means  and  standard  deviations, 
the  middle  QCfl  range  of  values  has  been  included  in  the  Table. 

In  many  rating  scales  there  is  a notable  tendency  for  raters  to  reduce 
the  usefulness  of  their  evaluations  by; 

(1)  Assigning  high  ratings,  on  the  average 

(2)  Indicating  little  or  no  differences  between  men  (small 
dispersions). 

As  indicated  in  the  previous  chapter,  it  was  thought  that  the  format  of  this 
scale,  together  with  the  manner  in  which  the  trait  descriptions  were  graduated, 
would  reduce  both  these  tendencies  appreciably. 

In  large  part  these  expectations  were  realized.  There  was,  of  course, 
variability  in  the  leniency  - stringency  factor,  mean  ratings  for  individual 
raters  extending  all  the  way  from  5,9  to  10, C on  the  13-point  scale.  However, 
the  mean  of  mean  ratings  for  43  raters  in  the  preliminary  study  v/as  7.0  — quite 
acceptably  near  the  center  of  the  continuum,  6.5. 
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In  general,  the  dispersicr.:  of  ratings  were  sufficiently  large  to 
provide  meaningful  and  useful  discriminations.  There  were  a few  raters  who 
assigned  a large  percentage  of  tie  ratings,  thus  producing  small  standard 
deviations  and  little  discrimination.  This  led  to  the  instruction,  for  later 
administrations,  to  avoid  tie  ratings  if  at  all  possible. 

Considerable  variability  can  be  seen  in  the  means  and  spread  of  scores 
from  trait  to  trait.  This  could  be  due  to  one  of  several  factors  or  in  part 
due  to  all  of  them.  One  possibility  is  that  the  variability  resulted  from 
differences  in  the  extremeness  of  phraseology  of  the  descriptive  phrases  for 
the  several  traits.  A second  possible  explanation  is  that  raters  may  have 
considered  it  more  logical  or  essential  to  assign  high  mean  ratings  for  some 
traits  than  others.  For  example,  a rater  might  reason  that  it  was  perfectly 
satisfactory  to  assign  moderate  or  even  low  ratings  in  a trait  such  as 
Manual  51:111.  but  might  consider  it  a reflection  on  his  oyn  performance  if 
anything  but  high  ratings  were  assigned  in  a trait  such  as  Discipline. 
Finally,  a third  possible  explanation  of  differences  in  means  and  variabil- 
ities of  the  several  traits  is  that  they  reflect  differences  which,  in  fact, 
Sift  exist.  For  example,  the  mean  score  in  Leadership  may  be  low  because  a 
substantial  proportion  of  the  men  rated  actually  have  had  no  opportunity  to 
display  leadership.  The  mean  rating  in  Discipline  may  be  high  because  the 
majority  of  men  offer  no  disciplinary  problems.  The  variability  in  Manual 
Skill  may  be  small  because  there  is  little  opportunity  to  demonstrate  dif- 
ferences in  motor  skills  in  the  particular  rates  studied. 
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Reliability  of  the  ratings 


important  feature  of  any  evaluating  instrument  is  the  consistency 
with  mieli  it  measures.  Few  reports  on  rating  scales  of  the  traditional  type 
indicate  satisfactory  reliability  and,  in  many  studies,  reliability  in  the 
rate-rerat.e  sense  is  not  even  reported. 

Having  established  that  raters  could  assign  meaningful  and  discriminat- 
ing ratings,  it  was  next  desired  to  ascertain  how  consistently  (reliably) 
they  did  this. 

One  hundred  fifteen  subjects  were  common  to  the  two  preliminary 
administrations  of  the  rating  scale  and  also  had  the  same  raters  each  time. 

These  were  non-selected  cases  from  six  submarines,  being  singled  out  solely 
because  of  participation  in  both  trial  runs.  The  elapsed  period  of  time 
between  the  first  and  second  ratings  varied  from  five  to  as  much  as  nine 
months  for  the  various  raters. 

Since  fairly  high  reliability  would  be  expected  on  the  basis  of  knowl- 
edge of  a man's  pay  grade  alone,  it  was  decided  that  the  reliability  should 
be  estimated  on  a within  pay  nrade  basis.  That  is,  for  each  pay  grade,  it 
was  determined  whether  a giver,  rater  placed  his  men  in  essentially  the  same  order 
(based  on  total  score)  during  the  second  evaluation  os  he  did  on  the  first.  The 
self-agreements  and  disagreements  then  were  plotted  in  a four-iold  table  (split  at 
the  median)  and  the  correlation  (tetrachoric)  was  computed.  For  cue  115  cases 
(over  300)ratir.gs)  the  correlation  was  rt  = .80,  This  compares  very  favorably  with 
reliability  figures  usually  obtained  with  more  objective  measuring  devices. 
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usually  attributed  to  rating  scales  (.60  to  .70). 


Acreement  arnonn  the  raters 

In  addition  to  reliability  in  the  rate-rerate  sense,  a useful  rating 
device  must  promote  substantial  agreement  among  raters  on  the  relative  pro- 
ficiency of  the  men  they  are  rating.  Indeed,  ratings  probably  can  be  consid- 
ered a satisfactory  criterion  of  performance  in  direct  proportion  to  the  amount 
of  agreement  among  rrters,  provided,  of  course,  the  characteristics  of  perform- 
ance being  rated  are  relevant  to  the  ultimate  criterion. 

Inter -rater  agreement  was  determined  by  correlating  one  rater's  judg- 
ments on  each  trait  with  those  of  each  other  voter  who  had  rated  the  same 

subjects.  Thus,  if  a pair  of  raters  both  considered  a man  either  abuve  or 

below  the  median  on  a certain  trait,  it  constituted  an  agreement,  if,  on 

the  other  hand,  one  rater  considered  the  man  above  the  median  and  another 

considered  him  below  it,  a disagreement  was  recorded.  Dy  considering  the 
regression  of  each  rater  on  each  other  rater,  a four-fold  table  of  frequencies 
representing  agreements  and  disagreements  was  established  for  each  trait 
in  the  scale  and  for  average  score.  From  these  four-fold  tables,  tetrachoric 
correlation  coefficients  were  computed  which  served  as  indices  of  agreement. 

The  question  again  arose  as  to  the  role  of  pay  grade  in  determining 
Liu.;  magnitude  of  inter-rater  agreement.  For  example,  it  seemed  a reasonable 
assumption  that  most  raters  would  generally  ucjree  that  a first-class  petty 
officer  was  better  than  a third-class  petty  officer  or  a striJrc-v.  Thus  the 
factor  of  pay  grade  alone  would  create  considerable  agreement  among  raters. 

It  was  considered  necessary,  therefore,  to  determine  whether  there  was  any 
basis  for  agreement  among  raters  independent  of  the  factor  of  pey  grade. 

Each  rater's  subjects  were  grouped  by  pay  grade.  On  each  trait 
these  pay-grade  croups  were  divided  into  upper  and  lower  sub-groups  with  the 
division  at  the  median.  (If  there  was  only  one  subject  in  any  given  pay 
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grade,  he  was  oir.it.ted  from  the  calculation.)  Those  subjects  in  the  upper 
sub-group  were  assigned  a passing  score  (-*)  while  those  in  the  lower  sub- 
group received  a failing  score  (0). 

With  these  “'cores  assigned  for  each  rater  for  each  trait  for  each 
pay  grade,  the  interco: relat. ior.s  of  raters'  judgments  was  again  a matter  of 
plotting  a four-fold  table  of  frequencies.  Ail  pay  grades  were  plotted  on  the 
same  table  and  ouch  subject  appeared  in  the  table  as  many  times  as  there  were 
pairs  of  raters  who  judged  him. 

The  coefficient  computed  from  this  table  may  be  termed  a within  nay 
grade  coefficient.  The  results  for  each  trait,  under  each  of  the  two  conditions, 
and  for  both  of  the  preliminary  samples,  are  shown  in  Table  II  on  the  following 
page.  In  addition  to  the  agreement  on  the  single  traits,  the  agreement  on 
average  scores  is  indicated.  The  values  for  the  traits  of  Neatness  of  Appearance. 
Dependability.  Adaptability.  Ability  to  Troubleshoot,  and  Manual  Skill  do  not 
appear  for  the  second  sample  since  those  traits  were  eliminated  from  the  second 
form  of  the  scale, 

Severa1  of  the  results  reported  in.  Table  II  require  comment.  There 
is  considerable  variability  in  the  size  of  the  coefficients  from  trait  tc 
trait  as  might  be  expected.  Considering  both  samples  and  both  conditions 
of  correlation,  t.he  trait  Social  Adiustment  seems  to  be  about  the  least  objec- 
tive. This  is  reasonable  in  view  of  the  obvious  difficulties  involved  both 
in  defining  Social  Adiustment  and  in  agreeing  on  obje'  tive  indicators  for  such 
a trait. 


Among  the  traits  leading  to  the  highest  degree  of  agreement  under  both 
conditions  of  correlation  were  Knowledge  of  the  Job.  Discipline.  Judgment  and 
Common  Sense  and  Leadership.  It  is  feit  that  the  behavioral  referents  for 
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TAULli  II 

INTER -RATER  AGREEMENT*  ON  TIE  VARIOUS  RATING  SCALE  TRAITS 
AND  ON  AVERAGE  SCORE,  FOR  BOTH  SAMPLES,  WITH  PAY  GRADE 
IN  AS  A VARIABLE  AND  WITH  P/AY  GRADE  HELD  CONSTANT 


Trait  First  Sample  Second  Sample 

(N=1G7)  (N=2C6) 


1.  Social  Adjustment 

2.  Quality  cf  Work 

3.  Neatness  of  Appearance 

4.  Cooperation 

5.  Watch  Standing 

6.  Knowledge  of  Job 

7.  Discipline 

8.  Application  and  Initiative 

9.  Judgment  and  Common  Sense 

10.  Dependability 

11.  Adaptability 

12.  Leadership 

13.  Overall  Efficiency 

14.  Neatness  of  Work 

15.  Ability  to  be  Taught 

16.  Care  of  Equipment 

17.  Ability  to  Troubleshoot 

18.  Sincerity  in  Job 

19.  Manual  Skill 

20.  Oyerall  Proficiency  in  Rate 


Pay 

Pay 

Pay 

Pay 

Grade 

Grade 

Grade 

Grade 

In 

Out 

In 

Out 

.32 

.26 

.46 

.40 

.53 

.56 

.55 

.39 

.42 

.42 

- 

- 

.30 

.58 

.50 

.57 

.45 

.44 

.60 

.59 

.66 

.61 

.78 

.50 

.60 

.87 

.62 

.58 

.49 

.59 

.60 

.54 

.61 

.67 

.69 

.46 

.47 

.43 

- 

.53 

.63 

- 

.60 

.55 

.72 

,56 

.54 

.62 

.65 

.51 

.51 

.48 

.64 

A A 
. -»-» 

An 

9 “**■ 

,55 

,42 

.42 

.61 

.55 

.35 

.60 

,54 

- 

.44 

.60 

.60 

.44 

.58 

.45 

- 

- 

.57 

.63 

.66 

.40 

Average  Rating  Scale  Score  .67  .73  .74  .59 


* Tetrachoric  Correlation  Coefficients 


these  traits  are  much  easier  to  define  and  their  manifestations  easier  to 
observe,  which  may  explain  the  greater  degree  of  agreement  on  them.  Further, 
they  are  all  traits  which  reflect  a basic  ability  to  perform  and,  as  such, 
nay  tend  to  correlate  with  each  other. 

The  factor  of  pay  grade  raised  or  loitered  the  inter-ruter  agreement 
depending  upon  circumstances.  In  the  first  sample,  the  removal  of  pay  grade 
variance  caused  the  majority  of  the  coefficients  to  increase,  while  a few 
remained  the  same  and  a few  decreased  in  size.  In  general,  the  increase 
in  agreement  occurred  with  traits  which  are  not  so  highly  related  to  pay 
grade.  Thus  it  may  have  been  that  instructions  designed  to  decrease  inter- 
trait correlations  and  reduce  halo  were  somewhat  effective  with  the  prelim- 
inary sample,  and  this  fact  showed  up  in  the  inter-rater  agreement.  It  was 
apparent,  in  scoring  the  ratings,  that  pay  grade  was  affecting  various  raters 
differently.  Some  were  overcome  by  its  influence  and  rated  strictly  in 
accordance  with  pay  grade  level  on  every  trait.  Others  remained  more  objective, 
their  ratings  in  many  traits  showing  much  smaller  relationships  to  pay  grade. 

If  both  kinds  of  raters  picked  the  same  men  within  pay  grade  level  as  best 
and  poorest,  then  removal  of  the  effect,  of  pay  grade  should  increase  inter- 
rater agreement.  It  is  believed  that  this  is  what  happened  in  the  first 
sample. 


In  the  revised  preliminary  sample,  however,  the  general  effect  of 
removing  pay  grade  variance  was  to  reduce  the  inter-rater  agreement.  It 

I 5 p.on(*i  l L nil  L l i ic  l L*  u 5 u ii  .ui  litis  lit  vn  out  of  the  C?.rl  1CT  results  WCS 

due  to  the  instructions  (this  time  the  lock  of  them)  given  to  the  raters, 
Since  there  was  practically  no  personal  contact  with  the  raters  during  the 
second  administration  of  the  rating  scale,  and  since  many  of  the  raters  in 
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the  second  sample  had  not  participated  ir.  the  first  ratings,  it  is  reasonable 
to  hypothesize  that  the  logical  and  halo  effects  created  by  the  presence  of 
pay  grade  were  greater  in  the  second  sample  than  in  the  first.  This  hypoth- 
esis is  substantiated  by  the  fact  that  the  correlations  of  pay  grade  with 
other  rating  scale  variables  was  higher  in  the  second  sample  than  in  the  first, 
and  also  by  tne  fact  that  trait  intercorrelations  were  higher  in  general  for 
the  second  sample.  It  is  suggested,  then,  that  pay  grade  variance  contributed 
relatively  more  to  the  magnitude  of  the  inter-rater  agreement  in  the  second 
sample,  and  its  removal  caused  a reduction  in  agreement. 

In  concluding  the  discussion  on  the  agreement  oi  raters,  a word  is 
in  order  about  the  level  of  agreement  obtained.  In  general  it  was  more 
substantial,  botli  on  individual  traits  and  on  average  score,  than  is  usually 
reported  with  conventional  scales.  Undoubtedly  exclusion  of  some  of  the  less 
objective  traits  would  have  increased  the  agreement  on  average  score  appreciably. 

In ter correlations  of  traits  and  results  of  factorial  analysis 

A survey  of  the  literature  on  rating  scales  revealed  few  studies  which 

] p 'X 

had  been  carried  as  far  as  factorial  analysis,  These  studies,  with  the 

exception  of  that  by  Bolanovich,  plus  well  established  indications  of  rating 
fallacies  snch  as  halo  effect  and  logical . errors  leave  one  with  t ho  impression 
that  seldom  are  more  than  one  or  two  factors  required  to  account  for  the 
high  inter-trait  correlations  which  typically  are.  found  in  rating  scale  studies. 


^Dolanovich,  D,  J„  "Statistical  analysis  of  an  industrial  relations  chart," 

J.  of  Annl.  Psychology.  30,  1946,  22-31. 

-Chi,  Pan-Lin , "Clalisliuai  analysis  oi  personality  ratings, "J.  of  Experimen- 
tal education.  5,  1937,  229-245. 

3Cwart,  E.,  Seashore,  S,  E.  and  Tiffin,  J.,  "A  factor  analysis  of  an  industrial 
merit  rating  scale,"  J.  of  A r>ol . Psychology.  25,  1941,  401-4C6. 
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A part  of  the  analysis  of  the  present  rating  scale  was  to  determine 
the  magnitude  of  t lie  inter-trait  correlations  and  to  pcnor:,i  factorial  analyses 
of  the  resulting  matrices;"  This  was  done  for  each  of  two  preliminary  samples 
of  submarine  personnel,  under  three  conditions  for  each  sample  with  a resultant 
total  of  six  analyses.  The  various  conditions  of  these  analyses  are  described 
below. 


Intercorrelation  of  traits.  Procedure  I.  In  the  initial  analysis, 
traits  were  intercorrelated  in  a straightforward  manner.  Individuals 
were  assigned  plus  scores  in  ali  traits  in  which  they  received  overage 
STEN-scores  of  five  or  better,  and  minus  scores  in  all  traits  in  which 
they  received  STEN-s cores  of  four  or  less.  Tetrachoric  coefficients 
were  then  computed. 

In  addition  to  the  rating  scale  traits,  each  man’s  general  Classifica- 
tion Test  Score  (CCT),  his  age,  length  of  time  on  board,  education 
and  pay  grade  level  were  included  in  the  matrix  for  analysis. 


In  the  I'avy,  pay  grade  is  quite  naturally  related  to  most  traits  that 
make  for  success.  This  fact  tends  to  increase  the  correlation  between 
traits  on  a rating  scale  such  as  the  one  under  discussion,  A s in  any 
organization,  those  indiviudals  who  are  at  the  most  advanced  levels 
ore  rated  higher  because  of  their  positions  and,  in  turn,  they  have 
been  advanced  to  those  levels  because  t^-y  possess  more  desirable 
amounts  of  t lie  tra.  that  are  necessary  for  advancement. 


In  view  of  this,  and  the  relative  naivete  of  the  raters,  one  would 
expect  the  intercorrelations  among  the  traits  to  be  quite  high.  This 

•Tables  III  through  VIII,  the  intercorrelation  matrices,  will  be  found  in 
Appendix  A, 
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expectation  was  realized  as  the  matrices  in  Table  III  (first  sample) 
and  Table  IV  (second  sample)  indicate,  (Because  of  their  size,  these 
and  the  other  correlational  matrices,  Tables  V,  VI,  VII  and  VIII 
appear  in  Appendix  A. ) The  intercorrelotions  in  the  second  sample 
were  somewhat  higher  than  those  obtained  from  the  first  sample. 

This  was  due  probably  to  less  personal  instruction  of  the  raters  in 
the  second  sample,  and  to  the  greater  dispersion  of  scores  as  a result 
of  the  instruction  that  no  tie  scores  be  given, 

[«  word  is  in  order  in  regard  to  the  reliability  of  the  coefficients 
in  these  and  subsequent  calculations.  The  standard  errors  aie  less 
than  the  N's  of  1G7  and  28G  for  the  two  samples  would  indicate.  This 
is  true  because  of  tiie  fact  that  the  cell  frequencies  in  the  four- 
fold tables  did  not  total  N,  but  rather  added  up  to  the  total  number 
of  pairs  of  ratings.  Since  there  were  two  to  three  raters  for  every 
ratee,  this  total  was  considerably  greater  than  K, ] 

Inspection  of  Tables  III  and  IV  indicates  that  pay  grade  is  highly 
associated  with  technical  skill  as  reflected  in  the  traits  of  knowledge 
of  the  Job.  Judgment,  and  Common  Sense.  Dependability.  Leadership. 
Ability  to  Tionblcshoot.  and  Age.  Less  highly  related  to  pay  grade 
is  a group  of  traits  which  is  considered  to  indicate  the  degic-e  of 
personal  adjustment  to  Navy  life.  In  this  group  arc  Social  A J instraent . 
Neatness  of  Appearance.  Cooperation.  Discipline.  App: i cat i on  and 
I niativc.  Abj  1 it y to  be  Taurht.  and  Neatness  of  Work.  Lencth  on  Board 
is  seen  to  have  a small  relationship  to  most  traits,  and  Educat 1 on  and 
CCT  have  practically  none,  (Submarine  personnel  are  pie-selected  on 
GCT. ) 


- 24  - 


r~ 

l 


Intercorrelation  of  traits.  Procedure  II  (Variance  within  pay.  crade) . 
Oecause  of  the  substantial  relationship  of  pay  grade  to  most  of  the 
rating  scale  trait  scores,  and  because  of  the  great  range  of  abilities 
represented  in  groups  containing  everyone  from  strikers  to  chief 
petty  officers,  within  nav  orade  intercorrelations  were  computed 
next,  treating  a man’s  scores  as  high  or  low  relative  only  to  his 
own  pay  grade  average.  If  some  factor  (s)  other  than  pay  grade  was 
in  part  producing  the  intercorrelations,  then  the  resulting  matrices 
should  contain  significant  coefficients.  If,  however,  pay  grade  weie 
t he  only  variable  operating  to  produce  the  intercor: elutions,  then 
the  matrices  would  contain  only  near-zevo  ceeff icients. 

In  Tables  V end  VI  it  is  possible  to  obtain  a very  general  picture 
of  the  effect  of  removing  pay  grade  variance.  It  will  be  noted  that 
either  method  of  removing  pay  grade  reduced  the  intercorrelations 
of  the  traits  appreciably,  but  did  not  by  any  means  leduce  them  to 
near-zero.  The  amount  of  reduction  can  be  discerned  from  t lie  sums 
of  the  coefficients  of  the  matrices.  In  the  first  sample,  the  sum  of 
the  coeff ccients,  Table  III,  was  162.39  (25  variables  - disregarding 
t he  signs).  In  the  reduced  matrix,  Table  V (24  variables),  it  was 
115.37.  Ir.  the  second  sample,  the  sum  of  the  coefficients,  Table  IV 
(21  variables),  was  125. C9.  In  the  reduced  matrix,  Table  VI  (20 
variables),  the  sum  was  74.61  . This  greater  change  in  the  second 
sample  supports  the  hypothesis  that  lack  of  instructions  during  the 
second  administration  increased  t lie  halo  dup  to  pay  grade,  and  thus 
its  contribution  to  the  total  variance. 

As  testimony  to  the  fact  that  the  within  pay  grade  variance  technique 
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was  effectively  removing  the  variance  due  to  pay  grade  level  alone, 
the  correlation  of  the  rating  scale  variables  with  the  trait  of 
may  be  examined.  In  both  samples,  age  correlated  very  highly  with 
pay  grade  (.03  in  the  first  sample,  .94  in  the  second),  Removing  the 
effects  of  pay  grade,  then,  should  be  tantamount  to  removing  the 
effects  of  age,  and  the  correlations  of  Ace  with  the  rating  scaie 
variables  should  distribute  themselves  rather  closely  about  zero. 
Examinations  of  the  correlations  with  age  in  Tables  V and  VI  reveals 
that  in  botl.  samples  this  did  occur,  over  three-fourths  of  the  coef- 
ficients having  values  between  plus  and  minus  .15,  From  this  evidence 
it  appears  that  the  reducing  procedure  accomplished  its  purpose. 

Intercorrelation  of  traits.  Procedure  III  (nay  grade  partial led  out). 

As  an  additional  check  on  the  technique  of  reducir.y  within  pay  grade 
variance,  however,. it  was  decided  to  compare  those  results  with  those 
obtained  by  partialling  out  statistically  the  effects  of  pay  grade. 

This  was  accomplished  by  starting  with  the  original  matrices  and  re- 
moving the  nay  grade  vai iance  from  each  coefficient  by  conventional 

4 

partialling  techniques,*1  The  pcrtialled  matrices  resulting  may  be 
seen  in  Tables  VII  and  VIII  for  the  first  arid  second  samples  respect- 
ively. 

In  comparing  specific  corresponding  coefficients  in  the  reduced  and 

partialled  matrices,  considerable  variation  can  be  seen.  Apparently 

the  two  procedures  did  not  oroduce  precisely  the  sane  effects.  However, 

a comparison  of  the  sums  of  these  matrices  for  both  samples  yields 

striking  similarities.  In  the  first  sample,  the  sum  of  the  reduced 

4j,  r,  Guilford,  Fundamental  Statistics  in  Psvcholorv  and  Education  (Hew  York: 
FcCraw-IM  1!  Cool:  Company,  1942),  pp.  2GC-271 . 


matrix,  Table  V,  was  115.07  and  that  of  the  nartiallpd  matrix,  Table  VII, 
was  117.67.  In  the  second  sample,  the  sum  of  the  reduced  matrix,  Table 
VI,  was  Yd, 61  while  that  of  the  partial  led  matrix,  Table  VIII,  was 
79. C6  . Thus  the  two  methods  removed  approximately  the  some  amount 
of  variance  in  each  sample,  but  they  treated  individual  coefficients 
somewhat  differently.  The  explanation  for  this  must  lie  in  the  fact 
that  what  was  accomplished  precisely  by  mathematics  was  dependent  on 
the  correlation  of  pay  grade  with  the  other  variables  derived  from 
the  entire  sample,  while  in  the  reducing  procedure  each  individual 
case  required  a decision  as  to  who  was  high  and  who  was  low  within 
each  pay  grade.  In  other  words,  if  the  same  amount  of  variance  were 
due  to  pay  grade  in  the  case  of  eacli  coefficient,  then  the  partial  ling 
procedure  should  correspond  very  closely  to  the  reducing  procedure. 

It  is  believed  that  this  was  not  the  case  in  every  instance  and  tints 
discrepancies  between  the  matrices  resulting  iron  the  two  procedures 
were  to  be  expected. 

r.s  a final  comment  on  the  partiallir.g  procedure,  it  should  be  pointed 
out  that  the  correlations  of  Aoe  with  the  rating  scale  variables  again 
distributed  themselves  about  zero  as  they  did  in  the  reducing  procedure. 
Again,  over  70  percent  of  the  correlations  with  age  were  within  the 
range  of  plus  or  minus  ,15.  There  were  a few  more  laige  coefficients  than 
with  the  reducing  procedure,  however,  and  a larger  number  of  negative 
coefficients. 

The  discussion  on  the  reducing  and  partial liny  procedures  should  not 
be  concluded  without  recognizing  thrt  these  techniques  probably  removed 
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too  much  vaiiance,  That  is  to  say,  if  many  of  the  qualities  under 
consideration  are  genuinely  related  to  PE>  grade,  then  removal  of  the 
effects  of  pay  grade  eliminated  real,  as  well  as  spurious,  variance. 
Evidence  that  this  was  the  case  occurred  particularly  during  the  analysis 
of  the  second  sample  of  data,  './lien  pay  grade  v/as  partial  led  from  this 
matrix,  a substantial  number  of  negative  correlations  arose  between 
the  rating  scale  traits  and  such  variables  as  Are.  Lencth  in  Service, 
and  scores  in  the  General  Classification  Test.  This,  in  turn,  required 
a reflection  of  those  variables  during  *.h«,  factoring  procedure,  which 
led  eventually  to  a fen  rather  substantial  negative  factor  loadings 
for  these  variables.  These  loadings  cannot  be  interpreted  in  the 
usual  sense,  and  are  regarded  as  artifacts  of  the  reducing  and  partial- 
ling  procedures. 


Hesults  the  factor  analyses.  Having  computed  the  inter-con  elation 
of  traits  under  three  conditions  for  each  of  the  two  camples,  six 
factor  analyses  were  next  performed  using  the  Thurstone  method.  The 
purpose  of  these  analyses  v/as  two-fold:  (1)  to  determine  the  extent 

and  nature  of  the  factor  structure  of  the  rating  scale  under  each  of 
the  three' conditions  of  conelations,  and  (2)  to  determine  how  similar 
the  factor  structure  was  from  one  sample  to  the  next. 


Extraction  of  factors  v/as  continued  us  long  as  the  cross-product 
of  any  pair  of  factor  loadings  exceeded  the  standard  error  of  the 
zero-order  coefficient  between  the  cori esoonding  pair  of  traits  in 
the  original  matrix.  For  the  sake  of  consistency  and  in  order  to  intro- 
duce a minimum  of  assumptions,  all  rotations  were  orthogona1  and  the 
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ci it eric  for  otation  were  positive  manifold  end  sinp’e  structure. 

notations  were  accomplished  according  to  a graphic  method  dose:  ibed 

, - . 5 

jy  -imr.ierr.inn. 

In  assembling  the  traits  which  comprised  the  rating  scale,  it  v;as 
felt  that  there  were  temperamental  components  of  performance  aboard 
submarines  which  were  of  as  much  importance  as  technical  knowledge 
and  skills.  Theoretically,  at  least,  the  temperamental  components 
should  be  independent,  of  the  technical  components,  although  for  any 
given  population  they  might  be  conelated.  For  this  reason,  orthogo- 
nal rotations  were  utilized  which,  in  fact,  rave  very  satisfactory 
solutions  for  the  original  data  from  both  samples.  For  the  sake  of 
determining  whether  or  not  the  factor  structure  v;as  altered  by  the 
reducing  and  '•aitialling  procedures,  orthogonal  rotations  weie  emp’oyed 
in  those  analyses  also.  Here  the  results  were  not  neai ly  as  satisfactory 
from  the  standnoint  of  identifying  factors.  It  may  be  that  an  oblirue 
solution  would  have  resulted  in  more  meaning  for  these  sets  of  data. 
However,  such  an  investigation  was  not  considered  advisable  because  of 
the  labor  '.eruired. 

In  Tables  lb.  X and  2iJ  (Appendix/.),  the  results  of  the  factoria1 
analyses  aenormed  on  the  data  from  the  first  samp’e  may  be  seen. 

In  Tables  XII.  XIII  and  XIV.  are  the  corres bonding  results  from  the 
second  sample.  The  most  remarkable  feature  of  these  results  is  the 
fact  that  the  factor  structure  increased  in  dimensiona' ity  when  variance 
due  to  pay  grade  was  removed.  In  the  first  samp’e,  there  were  four 

"Zimmerman,  U,  S.,  "A  simp’e  graphical  method  for  orthogonal  rotation  of  axes", 
Psychometr ika,  11:51-55,  1946, 
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fnctors  before  tne  effects  of  pay  grad,?  were  removed,  five  factors 
in  the  reduced  data,  and  six  in  the  data  in  which  pay  grade  was  held 
constant  by  partial  correlation.  In  the  second  sample  this  increase 
was  from  three  to  five  and  seven  factors  for  the  matrices  f:on  which 
pay  grade  was  removed.  Ho  obvious  explanation  for  this  is  readily 
available.  Cne  reasonable  hypothesis  might  be,  however,  that  in  the 
first  analysis  for  each  sample,  pay  grade  acted  as  a general  factor 
and  increased  the  correlation  between  factors  to  such  an  extent  that 
two  or  more  factors  emerged  as  one. 


Identification  of  factors.  The  most  readily  identifiable  factors  were 
extracted  from  the  original  matrices  of  both  samp’es.  In  both  these 
analyses  a factor  called  Technical  Competence  emerged  which  v;as 
practically  svnonomous  with  pay  grade.  The  traits  and  other  variables 
having  the  highest  loadings  in  these  factors  are  listed  below: 


First  Sample 


Technical  Competence 


Pay  Grade 

Knowledge  of  Job 
/ __ 

‘•uc 

Leadership 

Judgnent  and  Common  Sense 
Ability  to  Troubleshoot 

Second  Sample 


.97 

,70 

.75 

.69 

.64 

.63 

technical  Competence 


Pay  Grade  .99 
Kge  .95 
Length  in  Service  .92 
Judgment  and  Common  Sense  ,00 
Knowledge  of  Job  .80 
Leadership  .70 
Quality  of  Work  . 74 
Overall  Efficiency  in  Rate  .73 
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In  both  original  analyses  a factor  also  emerged  which  seemed  best 
identified  as  Personal  Adjustment  of  some  kind.  Involved  in  it  were 
attitude  toward  the  job  and  shipmates,  effort,  sincerity,  dependability, 
etc.  The  traits  having  tiie  1'nhest  loadings  in  this  factor  for  each 
sample  are  listed  below: 


. Sanp’e 

Personal  Adjustment 

Cooperation 

.79 

/•p;>li cation  and  Initiative 

.77 

Ability  to  be  Taught 

.73 

/-  daoiubi  1 it  y 

,73 

Sincerity  in  the  Job 

.69 

Overall  efficiency 

.69 

Care  of  equipment 

.67 

Dependability 

.64 

id  Sample 

Personal  Adjustment 

Ability  to  be  Taught 

.75 

Watch  Standing 

.70 

Cooperation 

.66 

Application  and  Initiative 

.65 

Care  of  equipment 

.64 

Sincerity  in  the  Job 

.64 

Social  Adjustment 

.62 

Quality  of  Work 

.60 

Overall  efficiency 

.60 

Also  identified  from  the  original  data  of  the  first  samp’e  was  a factor 
which  apoears  best  described  as  Carefulness  or  Neatness  in  Work  and 
Person.  Traits  having  the  highest  loadings  were: 


First  Sanp’e  Carefulness  or  Neatness 


Cua! ity 

of  Work 

.56 

Neatness 

ot  Work 

m 

Care  of 

equipment 

r,r 

f 

Cveral 1 

efficiency  in  Cate 

.50 

Discipl i 

ne 

.45 

Neatness 

of  Appearance 

.40 
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This  rather  well  defined  factor  was  not.  identified  in  the  analysis  of 
the  data  from  the  second  sample. 


The  fourth  and  final  factor  emerging  from  the  original  data  of  the 
first  sample  appeared  to  Lie  some  sort  of  efficiency  or  Job  Performance 
factor.  The  traits  with  principal  loadings  were: 


First  Sample 

Neatness  of  Work 
Overall  Efficiency  in  Rate 
Watch  Standing 
Overall  Efficiency 
Sincerity  in  the  Job 
Manual  Skill 
ability  to  Troubleshoot 
Judgment  and  Common  Sense 
* uality  of  Work 


Efficiency  or  Job  Performance 

.59 

.57 

.55 

.46 

.45 

-44 

.37 

.37 

.34 


The  third  and  final  factor  emerging  from  the  original  date  of  the  second 
sample  was  found  only  in  that  samp  e.  While  all  loadings  were  low,  it 
appeared  to  be  best  identified  witli  Maturity  or  Experience: 


Second  Gamp  e 

Length  on  Board 
Education 
Knowledge  of  Job 
Judgment 
Leadership 


Maturity  or  Experience 

.46 

.41 

.30 

,35 

,33 


The  factors  extracted  irom  the  reuucea  ana  '.nriiHi lea  matrices  were 
very  difficult  to  identify.  Traces  of  the  factors  extracted  from  the 
original  data,  listed  above,  could  pc  seen  throughout,  i o loadings, 
of  course,  were  very  different  due  to  the  removal  of  pay  grade  va.iance. 
An  attempt  was  made  to  classify  the  factors  into  two  brood  groups  — 
those  of  a Technical  nature  and  those  of  an  Adjustment  nature.  Even 
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with  such  a broad  classification,  some  factors  seemed  to  be  as  much 
a member  of  one  class  as  of  the  other,  Because  of  th°  possibly 
limited  significance  of  these  patterns  of  loadings,  they  will  not 
be  reported  here. 

The  difficulty  in  identifying  these  last  groups  of  factors  piobably 
rests  partly  with  the  fact  that  too  few  experimental  variables  were 
included  for  identification  of  the  large  number  of  factors,  which 
theoretically  might  be  rated.  To  this  difficulty  may  be  added  the 
fact  that  the  rating  scale  traits  were  not  sufficiently  definitive 
to  permit  a distinction  between  two  factors  which  were  comprised  in 
large  pait  of  the  same  traits.  For  example,  in  the  field  of  aptitudes, 
identification  of  a factor  in  which  many  tests  have  loadings  can  be 
made  with  considerable  confidence  if  a relatively  pure  test  (such  as 
number  operations  or  vocabulary)  is  highly  saturated  with  the  factor. 

In  the  present  analysis,  however,  one  would  hesitate  to  identify  a 
factor  as  Cooperation,  for  example,  simply  because  the  trait  called 
Cooperation  had  the  highest  loading  in  that  factor. 

Identification  of  factors  extracted  from  the  reduced  and  partial  led 
matrices  was  increased  in  difficulty  also  by  the  fact  that  these  pro- 
cedures removed  a good  proportion  of  genuine  as  well  as  spurious 
variance.  This  left  many  of  the  traits  with  reduced  loadings  in  certain 
factors,  and  decreased  differences  in  loadings  between  traits.  This 
resulted  in  less  distinct  patterns  of  loadings  which  were  correspond- 
ingly more  difficult  to  identify. 
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Chapter  IV 


RESULTS  OF  THE  SilIPBCMD  USE.  OF  THE  HATING  SCALE 
3,  relationships  with  other  Performance  Measures 


The  oerfornance  ratine  scale  was  the  first  of  three  ciiterion  measures 
developed  in  the  study  of  shipboard  performance  of  tluvy  enlisted  peisonnel. 

The  results  reported  in  previous  chapters  indicate  that  it  p'ovides  a means 
for  obtaining  reliable,  disciiuinating  measures  of  at  least  two  oeneval  aspects 
of  shipboard  performance  — one  technical  and  or.e  adjustive  in  nature. 

Later  in  the  study,  performance  check  lists  were  developed  to  determine 
how  well  officers  and  petty  officers  could  rate  their  men  on  ability  to  per- 
form  specific  tasks  from  their  jobs.*  These  were  much  more  specific  and 
technical  in  nature  than  the  general  traits  of  the  rating  scale. 

Finally,  actual  peiformance  tests  were  constructed  aiound  tasks  repie- 
sentative  of  the  Et.i  and  EN  jobs  aboaid  submarines.  These  provided  an  oppor- 
tunity to  compare  rated  performance  with  actual  tested  performance. 


Correlation  of  rating  scale  and  check  list  evaluation;. 

It  would  be  expected  that  scores  from  the  two  rating  measures  would 
correlate  appreciably,  simply  because  scores  on  both  were  substantially  re- 
lated to  pay  grade.  Fart  of  the  relationship  of  pay  grade  to  the  ratings 
reflected  t lie  genuinely  higher  average  perlormance  ievei  of  men  in  the  higher 
pay  grades,  but  pa:t,  no  doubt  also  reflected  the  tendency  of  some  raters 


to  rate  men  high  simply  because  they  were 


pay  erode  and  "should"  be 


able  to  perform  better.  It  was  decided,  therefore,  that  the  relationships 


*S<?e  Part  III,  this  report, 
^See  Part  I,  this  report. 


- 34  - 


between  the  two  measures  should  be  investigated  on  a .vit. hrn  pay  rrade  basis. 
Tn  order  to  have  an  estimate  of  the  highest  lively  relationship,  however,  it 
was  decided  to  correlate,  across  all  pay  grades,  scores  fror»  the  check  lists 
with  perhaps  tire  most  technical  rating  scale  trait,  f.bi litv  to  Tr  ouble shoot. 

The  results,  broken  down  by  officer  and  enlisted  rater,  and  by  ob- 
served and  total  check  list  scores,*  appear  in  Table  XV.  The  very  high 
relationships  obtained,  particularly  in  the  case  of  cheek  list  total  scores, 
suggest  that  the  check  list  and  technical  traits  of  the  rating  scale  were 
measuring  essentially  the  same  things  if  scorss  were  taken  acioss  pay  grade. 


t;.blc  xv 


c gzhel. ticns  betnzek  sccnzs  cn  the  hating  scale  tia.it 

"A  3ILITY  TC  THCUBLEGHCCCT"  AND  VAHIC’JG  CHECH  LIST  GCCHZC 
C-.LL  P-.Y  GLADES  INCLUDED) 


SC  CHE 

unlisted  rater 
Officer  rater, 

Enlisted  rater 
Officer  rater, 


observed  scores 
observed  scores 

total  scores 
total  scores 


.7G  c:  = 229) 
.53  CN  n IGG) 

.89  (N  = 234) 
.90  (N  = >59) 


EN 


.7’  CN  = 246) 
.52  CN  = 42) 

.63  CN  = 293) 
.79  CN  = 215) 


The  within  pay  grade  study  was  made  on  striker  and  third-class  petty 
officers  only.  These  were  the  groups  singled  out  for  special  study  as  a 
res'.’lt  of  the  overall  design  of  the  project.  (6s  a part  of  this  study, 
specially  designed  aptitude  te.’ts  had  been  administered  to  men  entering  the 
Submarine  School,  New  London,  It  was  intended  that  the  relationships  between 
these  tests  and  the  shipboard  criterion  measures  would  i examined.  There- 
fore attention  was  focused  on  striker s and  thirds  since  few  higher  rated 
petty  officers  would  have  been  recent  Submarine  School  graduates.) 

*See  I art  III,  this  report  for  a discussion  of  "observed"  check  list  scores. 


The  results  appear  In  Table  VI..  Ccrielcitions  aie  reported  for  strikers 

and  thirds  separately  and  also  across  nay  grade  to  study  the  influence  of  that 
variable  at  the  lower  pay  grade  levels. 


TABLE  XV T 

RELATIONSHIPS  BETWEEN  CHECK  LIST 
AND  RATING  SCALE  TOTAL  SCORES 

(Striker  and  Third-Class  Petty  Officers.  Biserial  Correlations) 


Strikers 

Strikers 

Strikers 

Thirds  6 Thirds 

Strikers 

Thirds 

& Thirds 

Enlisted  rater, 

observed  scores 

. 65 

.53  ,5C 

.20 

.42 

.50 

Officer  rater, 

observed  scores 

.13 

.40  .37 

.25 

.40 

.44 

Enlisted  rater, 

total  scores 

. d6 

.45  .61 

.41 

.61 

.56 

Officer  rater, 

total  scores 

. 5G 

. 5C  . 54 

OO 

.23 

.40 

The  last  tv/o  lines  in  Table  XVT  reveal  the  correlations  between  total  scores 
on  the  check  lists  and  total  rating  scale  scores  to  be  substantial  even  within 
the  lower  pay  grades  and  in  spite  of  the  fact  that  some  of  the  rating  scale 
variance  was  non-technical  in  nature.  The  inclusion  of  pay  grade  variance  does 
noi  appear  to  increase  these  relationships  systematically  at  these  job  levels. 

As  a final  check  on  the  relationships  between  check  list  and  rating 
Scale,  Scores  on  oacll  of  the  twenty  individual  traits  on  Hip  nrpl  irrtinnrv  rnting 
scale  were  correlated  with  total  check  list  scores.  These  results  appear  in 
Table  XVII.  It  cun  be  seen  that,  even  on  a within  pay  grade  basis,  the  more 
technically  oriented  rating  scale  traits  tended  to  correlate  higher  with  the 
check  list  scotes  than  those  which  were  more  adjust ive  or  attitudinal  in  nature. 
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TABLE  XVII 


RELATIONSHIPS  BETWEEN  PERFORMANCE  CHECK  LIST  TOTAL  SCORES 
AND  SCORES  ON  INDIVIDUAL  TRAITS  OF  THE  PRELIMINARY  RATING  SCALE 
(Within  Pay  Grade  N = 187) 


Correlation  (Tetrachoric) 

Rating  Scale  Traits  with  Check  List  Scores 


1,  Social  Adjustment 

-.10 

2.  Quality  of  Work 

.32** 

3.  Neatness  of  Appearance 

,07 

4.  Cooperation 

.14 

5.  Watch  Standing 

.32** 

6.  Knowledge  of  Job 

.49** 

7,  Discipline 

.29* 

8.  Application  and  Initiative 

.16 

9,  Judgment  and  Conmon  Sense 

.40** 

10.  Dependability 

.40** 

11.  Adaptability 

.37** 

12.  Leadership 

.35** 

13.  Overall  Efficiency 

.46** 

14.  Neatness  of  Work 

.40** 

15.  Ability  to  be  Taught 

O 1 

16.  Care  of  Equipment 

.35** 

17.  Ability  to  Troubleshoot 

.41** 

18.  Sincerity  in  the  Job 

.43** 

19.  Manual  Skill 

.43** 

20.  Overall  Efficiency  in  Rate 

.31** 

* Significant  at  the  5%  level. 

**  Significant  at  the  1%  level. 


For  example,  Knowledge  of  the  Job . Judgment  and  Common  Sense . Overall 
Efficiency.  Ability  to  Troubleshoot  and  Manual  Skill  all  correlated  very  signi- 
ficantly with  total  check  list  scores  on  a within  pay  grade  basis.  However,  certain 
traits  associated  with  the  adjustment  factor  in  the  rating  scale,  including  Social 
Adjustment f Neatness  ot  Appearance . Cooperation . AppI ication  and  Initiative . and 
Ability  to  be  Taught  did  not  correlate  significantly. 
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Correia tions  between  rating  scale  score?  and  performance,  test  scores 


As  a final  study  of  the  nature  of  rating  scale  scores  as  criteria  of  ship- 
board performance,  correlations  were  computed  with  scores  on  the  job  sample  tests 
and  written  job  knowledge  examinations.  Again,  these  relationships  were  studied 
first  as  they  existed  for  all  pay  grades  and  later  as  they  existed  within  the 
striker  and  third  class  petty  officer  levels. 

Table  XVIJI  reveals  the  across  pay  grade  relationships  between  the  ten 
traits  on  the  final  version  of  the  rating  scale  and  average  job  sample  test  scores 
and  written  job  knowledge  test  scores.  Generally,  the  traits  Ability  to  Trouble- 
shoot. Knowledge  of  the  Job,  Judgment  and  Common  Sense,  Leadership  and  Care  of 
Equipment  were  more  highly  related  to  performance  test  scores  than  were  other 
traits.  This  tendency  was  more  pronounced  for  EM's  than  for  EN’s. 


TADLE  XVIII 

CORRELATION  BETWEEN  ACROSS-PAY  GRADE  RUING  SCALE  TRAIT  SCORES  , 
AVERAGE  JOB  SAMPLE  TEST  SCORES  , AND  JOB  KNOWLEDGE  TEST  SCORES 
(Tetrachoric  Coefficients) 

EklS*  EN’s* 


/ivpranp 

Job 

Average 

Job 

job  SBIP.pl 6 

K'nnui  1 prlrte 

... 5,  - 

•Job  Sample 

Knowledge 

Characteristic 

Score 

Score 

Score 

Score 

Cooperation 

.25 

.36 

.38 

.40 

Knowledge  of  Job 

.50 

.48 

.34 

.51 

Discipline 

.30 

.32 

.36 

.35 

Application  and  Initiative 

Al 

.36 

.32 

.40 

Judgment  and  Common  Sense 

.53 

.47 

.38 

f 41 

Leadership 

.57 

.52 

.45 

.44 

Neatness  of  Work 

.43 

.44 

,46 

.45 

-t  , /»  r»  . ..x 

VU  4 U V*  A w 

CO 

014 

An 

Ability  to  Troubleshoot 

Vb2 

.53 

.55 

.56 

Sincerity  in  Job 

.43 

.36 

.44 

.47 

* For  Em's,  N = 315  Strikers 

through  CPO's; 

for  EN’s,  N = 

403  Strikers 

through 

CPO's. 
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A similar  analysis  next  was  done  on  a within  pay  grade  basis,  including 
strikers  and  third  class  petty  officers  only.  These  results  appear  in  Table  XIX. 
It  will  be  noticed  that  the  relationships  are  smaller  in  every  case.  The  cor- 
relations still  are  appreciable  for  the  Erl's  but  in  the  case  of  EN’s,  the  cor- 
relations of  job  sample  test  scores  with  the  rating  scale  traits  reduced  to  nearly 
i.eio.  Tiiis  was  not  true  of  the  correlations  between  rating  traits  and  the  written 
job  knowledge  test,  however.  For  the  written  test  the  drop  in  correlations  was 
quite  similar  for  botn  EM’s  and  EN's. 


TABLE  XIX 

CORRELATIONS  BETWEEN  WITHIN-PAY  GRADE  RATING  SCALE  TRAIT  SCORES, 
AVERAGE  JOB- SAMPLE  TEST  SCORES. , AND  WRITTEN  JOB- KNOWLEDGE  TEST  SCORES 

(Tetrachoric  Coefficients) 

EM’s*  EN‘s* 


Characteristic 

Average 
Job  Sample 
Score 

Cooperation 

.20 

Knowledge  of  Job 

.48 

Discipxine 

,26 

Application  and  Initiative 

OA 
# JV 

Judgment  and  Cuniuon  Sense 

.42 

Leadership 

.38 

Neatness  of  Work 

.45 

Care  of  Equipment 

.38 

Ability  to  Troubleshoot 

.39 

Sincerity  in  Job 

.40 

* For  Ei;i:s,  N = 147  Strikers 
3rd  Class  PO's. 

and  3rd  Cla 

Job 

Average 

Job 

Knowledge 

Job  Sample 

Knowledge 

Score 

Score 

Score 

.20 

.02 

.27 

.25 

.16 

.32 

-.16 

.04 

.10 

OA 
, cv 

a VTI 

CM 

• u*» 

0£ 
« ou 

1 o 

« 1U 

If l 

« w 

.27 

.04 

.25 

.27 

.10 

.28 

.23 

.10 

.35 

.32 

.23 

.36 

.21 

.23 

.39 

PO's;  for  EN’s,  N = 190  Strikers  and 


In  addition  to  the  correlations  for  the  individual  traits  of  the  rating 
scale,  total  scores  also  were  correlated  with  average  performance  test  scores. 
These  values  proved  to  be  .14  for  EN's  and  .38  for  Erl's.  Thus,  as  might  be 
expected,  certain  of  the  more  technically  oriented  rating  scale  traits  correlated 
higher  with  tested  performance  than  did  the  average  of  scores  from  all  traits. 
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Conclusion 


Some  of  the  results  of  the  foregoing  analyses  may  be  regarded  as  encouraging 
and  some  discouraging  as  far  as  the  relationship  between  rated  and  tested  per- 
formance is  concerned. 

Relationships  between  the  two  kinds  of  measures  tend  to  be  substantial  if 
considered  across  pay  grade.  This  would  be  true,  however,  because  both  are,  them- 
selves, related  to  pay  grade.1 

Within  pay  grade  the  correlation  was  appreciable  for  EM's  but  not  so  for 

EN's.  It  would  be  highly  desirable  if  rated  performance  would  accurately  reflect 

tested  performance  within  each  pay  grade  level.  However,  this  is  demanding  reliable 

and  valid  observation  of  technical  performance  within  very  narrow  ranges  of  skill 

« 

and  job  experience.  Probably  most  Navy  raters  have  not  gathered  sufficient  spe- 
cific evidence  from  their  incidental  observations  of  men  on  the  job  to  know  the 
precise  nature  and  rank  order  of  skills  among  mer.  in  the  lower  pay  grades.2 

It  is  also  quite  possible  that  the  performance  tasks  used  in  this  study  were 
not  sufficiently  comprehensive  to  reflect  the  variety  and  depth  of  skills  demanded 
in  the  two  rates  studied.  This  prnhahly  was  more  true  of  the  EN  than  of  the  EM 
battery  and  the  correlations  throughout  the  study  lend  credence  to  this  notion. 

Finally,  it  must  be  remembered  that  the  rating  scale  was  designed  to  reflect 
aspects  of  shipboard  performance  other  than  the  technical  ones.  Job  attitude  or 
adjustment  to  Navy  life  must  be  considered  highly  important,  particularly  in  the 
Submarine  Navy,  These  aspects  of  performance  are  possibly  greater  determiners  of 

1 The  correlation  between  pay  grade  and  performance  test  scores  was  ,36.  That  be- 
tween pay  grade  and  ratings  was  ,62  in  the  preliminary  and  ,69  in  the  revised 
preliminary  scale. 

2 This  conclusion  also  is  borne  out  by  the  results  reported  in  Part  III  of  this 
report. 
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the  ratings  a man  receives  from  his  superiors  than  are  technical  knowledge  and 

cMU 

It  would  seem,  then,  that  reliable  and  discriminating  ratings  by  superiors 
would  fovm  a necessary  part  of  the  criterion  of  any  man's  shipboard  performance. 
However,  the  within-pay-grade  correlations  between  ratings  and  tests  of  performance 
in  this  study  were  so  low  that  it  must  be  considered  desirable,  in  addition,  to 
actually  test  a man  on  his  practical  factors  wherever  possible. 
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Chapter  V 


CONCLUS IONS 

The  use  by  Navy  personnel  of  a rating  scale  designed  to  measure  the  ship- 
board performance  of  submarine  personnel  has  led  to  the  following  conclusions: 

1.  Reliable  and  discriminating  evaluations  of  performance  aboard  ship  can 
be  obtained  with  a rating  scale  that  is  properly  designed  and  used. 

The  "man -to -man"  format  adopted  for  this  study  appears  to  be  practical 
and  tends  to  promote  better  ratings  than  conventional  scales. 

2.  Reliable  ratings  can  be  obtained  on  single  traits  as  well  as  on  total 
scores;  reasonably  reliable  ratings  can  be  obtained  within-pay-grade, 
even  at  the  striker  level. 

3.  Navy  raters  need  specific  instructions  on  the  following  if  their 
ratings  are  to  be  highly  useful: 

a)  halo  effect,  particularly  that  due  to  pay  grade; 

b)  the  logical  error  leading  to  high  intercorrc latior  of 
traits; 

c)  the  need  for  maximum  discrimination  in  ratings; 

d)  the  need  for  a realistic  average  in  the  ratings  assigned. 

To  a certain  extent  written  instructions  accomplish  the  needed  edu- 
cation of  raters.  Personal  instruction  is  to  be  preferred,  however. 

4.  At  least  two  broad  aspects  of  shipboard  performance  can  be  tapped  by 
means  of  ratings.  In  this  study  these  appeared  to  be  technical  compe- 
tence or  job  skill  on  the  one  hand,  and  personal  adjustment  or  atti- 
tude toward  the  job  on  the  other. 

5.  Appraisals  of  performance  made  with  the  rating  scale  correlate  with 
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actual  tests  of  performance  to  a very  limited  extent  if  the  range  of 
experience  is  restricted  to  a single  pay  grade.  This  is  apparently 
more  true  for  some  rates  end  pay  grades  than  others, 

6.  Inter-rater  differences  in  leniency  must  be  accounted  for  if  high 
agreement  and  reliability  are  to  be  realized.  This  is  particularly 
true  if  such  agreement  is  sought  on  a v;i thin  pay  grade  basis. 

The  usefulness  of  a rating  device  depends  on  how  it  is  designed  and  how 
well  it  is  used.  The  device  should  include  well  defined  traits  and  performance 
levels.  It  should  be  designed  so  that  men  may  be  readily  compared  with  one 
another  within  the  framework  of  a single  performance  characteristic.  It  should 
promote  discriminating  ratings.  Ideally,  the  raters  should  be  aware  of  the 
common  pitfalls  of  rating  and  the  many  uses  for  good  ratings.  Under  these 
conditions,  highly  useful  evaluations  of  performance  can  be  made. 

Because  of  the  limited  correlations  between  ratings  and  performance  test 
scores  obtained  in  this  study,  it  must  be  concluded  further,  that  a satisfactory 
criterion  of  shipboard  performance  demands  the  inclusion  both  of  ratings  by 
superiors  and  actual  tests  of  performance. 
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FACTOR  LOADINGS  AFTER  ROTATION  .MID  CCulUNALITIES 
DEFCRE  AND  AFTER  RCTATIOIi  (FIRST  SAMPLE,  N = 187, 
ACROSS  PAY  GRADE) 
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KNOWLEDGE  OF  THE  JOB 

How  well  does  each  man  know 
the  technica1  duties  of  his  rate? 
Does  he  know  all  the  fine  points  or 
dees  he  j l st  know  enough  to  get 
by?  How  often  does  someone 
have  to  help  him  get  the  job  dove? 
Does  he  know  mere  or  less  than 
most  men  of  the  same  rate? 
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Has  as  much  or  more  job  knowl 
edge  as  the  very  best  of  his  rate. 
Knows  all  the  fine  points  of  his 
job.  Never  needs  help  on  the  ob. 


Practically  never  needs  help  on 
the  job.  Has  very  little  to  learn 
about  his  job  except  for  a few  of 
the  mest  technical  aspects. 


Has  adequate  knowledge  to  turn 
n a satisfactory  job.  Sometimes 
needs  help  on  the  job  since  tie 
still  has  a few  Hirtqs  to  earn. 


Has  much  to  learn  about  his  joo. 
Needs  supervision  very  frequently 
since  he  lacks  much  of  He  funda- 
mental knowledge  necessary  to  do 
He  job  alone. 


INSTRUCTIONS  TO  RATERS 
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5)  This  is  important.  Do  not  rate  one  man  higher  than  another  simply  because  he  is  in  a 
higher  pay  grade.  Experience  and  ability  generally  run  together,  but  they  are  not 
always  perfectly  related.  These  ratings  are  to  be  used  for  research  purposes  only  and 
will  not  be  a factor  in  anyone's  advance  ment.  We  sincere!'/  request  your  frank  and 
honest  impressions. 
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ct  frequently  is  a real  prob- 


rely  very  highly  on  his  ability  to 


for  advice.  Show;;  no  ambition  to 
lead  others. 


INSTRUCTIONS: 

lj!f  you  were  transferred  to  another  ship  from  the  one  you  are  now  on  and  were  to  be  the  divisicr  oft i er 
O' senior  petty  officer  you  would  be  interested  in  both  the  petty  officer  qualities  (leadership,  coof  sratic  n, 
initiative,  sincerity,  etc.)  of  your  men  and  in  the  r ability  in  rating  [technical  skill). 

Considering  both  petty  officer  qualities  and  abilily  in  rating,  which  one  of  these  men  would  you  m st  wa  t 
to  take  with  you  it  you  were  transferred  to  another  ship?  Which  man  would  be  of  greatest  value  "o  you  ' 
Indicate  your  first  choice  by  placing  number  I opposite  the  man's  name  in  the  cclumr  headed  “(  rder  c-1 
Cfoice.  Place  a 2 opposite  your  second  choice,  a 3 opposite  your  third  chc:ce  and  so  on  until  ^ ^u  hav 
asiigri&d  a number  to  each  man  in  the  gang. 

2)  In  the  space  headed  "Psemarks,"  make  any  comment  that  you  feel  should  be  considred  in  evalu  ting  rhe 
ratings  you  have  assigned  to  the  men  in  your  gang.  For  example, 

"Only  been  aboard  60  days" 

"Stands  only  look-out  watches" 

"Will  never  make  a submarine  sailor" 

"Finest  man  I have  ever  known 


